CACHING ON THE INTERNET

 

By Lisa Sanger

 

Spring 1996

 

--------------------------------------------------------------------------------

 

INTRODUCTION

 

A newly joined Prodigy member logs on to the service for the first time and it takes several minutes for her computer to download the fancy homepage graphics. The next day she logs on again and this time the graphics pop up almost immediately. She wonders: Why did her computer perform the same download faster the second day?

 

While browsing the Internet, a Web surfer notices that his computer retrieves documents from the popular Electronic Frontier Foundation website much faster than it retrieves documents from his favorite, if rather eccentric, New Trends in Underwater Basket Weaving website. He wonders: Why does it take longer to download a less popular and less congested website?

 

HowTired, a popular online magazine, notes a sudden and inexplicable drop in "hit" rates (i.e. popularity statistics). HowTired wonders: Why have we lost these hits?

 

The answer to these three questions is: caching. Among other things, caching can speed up downloading on one's personal computer; caching can speed up connection time to popular sites on the Internet; and caching can mask websites' hit rates. Any person active on the Internet has probably both cached documents herself and received cached data from elsewhere on the Internet. Caching is widespread on the Internet, but it raises thorny issues under U.S. Copyright law.

 

In this paper I examine caching, the implications of U.S. Copyright law and potential resolutions. Part I defines the term caching. Part II discusses the benefits and drawbacks of caching on the Internet. Part III specifically explores whether caching constitutes copyright infringement. And, finally, Part IV considers whether proxy caching should continue.

 

I. CACHING: DEFINING THE TERM

 

Caching is a generic term meaning "to store." Whether a person caches web pages to avert Internet traffic, or a squirrel caches nuts for the winter, the concept is the same. When applied to the Internet, "caching" means "the copying of a web page, made incidental to the first access to the page, and storage of that copy for the purpose of speeding subsequent access."

 

There are two ways to cache web pages on the Internet: "client caching" and "proxy caching." Important functional differences exist between client caching and proxy caching and these differences may in some instances influence the legal analysis. Unfortunately, people often forget to specify what type of caching they are referring to leaving the entire discussion of caching quite muddled. In an effort to clarify the muddle, I will review the definitions of client caching and proxy caching.

 

A. Client Caching

 

Client caches reside within an individual user's Web browser software (such as Netscape or Mosaic). The client cache stores not only the documents currently displayed in browser screens, but also documents requested in the past. Client caching takes two forms: persistent and non-persistent. A persistent client cache retains its documents between invocations of the Web browser. Netscape uses a persistent cache. A non-persistent client cache (used in Mosaic) removes any memory or disk space used for caching when the user quits the browser.

 

The client caching process works similarly whether it maintains persistent or non-persistent cache. When the user's computer requests a website, the computer will first check to see if the data requested already resides in the cache. If the cache has a copy of the requested data, then the cache provides the data very quickly to the user. If the data is not in the cache, the computer fetches the item needed from the Internet, and also stores a copy in the cache. Now the cache has this data available if the processor requests it again. The larger the cache, the more data the cache can store, and the more likely the cache will have the requested item.

 

B. Proxy Caching

 

The second form of caching, "proxy caching," takes place on a network used by the World Wide Web ("WWW" or "Web"). Proxy caches reside on machines in strategic places (typically gateways) in the network of the WWW. Both non-profit local access networks, and large for-profit service providers such as Prodigy, run proxy caches. Unlike client caching, which services only one client, proxy caches service many clients. Thus, proxy caching helps relieve the intense congestion now plaguing the Internet on a much grander scale than client caching.

 

Proxy servers act as intermediaries between local clients and remote content servers. Initially, scientists developed proxy servers to function as firewalls for security reasons. Today proxy servers may function as caching mechanisms as well as firewalls. A proxy server is a machine (or collection of machines) through which all traffic must pass.

 

When a user asks a client for a certain web page, the client heads out to the Internet. If there is a caching proxy, client requests go to the proxy server, not to the remote web page. The proxy checks to see if it has already cached the requested page on the proxy server. If the server has a cached copy of the web page, the server returns the page to the client directly (see Figure 1 below). Reporting cached information to clients occurs rapidly because it requires reduced Internet activity. In addition to helping individual clients and networks with proxy servers, caching also helps the remote web page server. Caching reduces the computational load on the remote content server and makes it possible for that server to supply data to more machines exponentially. If the server does not have a cached copy of the requested document, the server goes out to the remote web page server, finds the original, and passes the data back to the client at the same time keeping a copy on its cache (see Figure 2 below). The "cache proxy" or "proxy cache" server is, at one and the same time, a "proxy" (it accepts requests from clients, and carries them out on the clients' behalf) and a "cache" (it keeps a copy of the documents that it retrieves, and fulfills subsequent requests from that copy where appropriate).

 

Figure 1. (omitted in HTML version)

 

Figure 2. (omitted in HTML version)

 

C. Distinguishing Caching from Archiving

 

In every day practice, the line between the caching process and the archiving process can become gray. However, technically speaking, the two processes have distinct differences.

 

Caching entails a more automated process than archiving. A cache simply copies data which passes through it on the way to the client. The cache retains the copy in case the client, or another client in the same network, needs the document again. After a certain amount of time, the cache needs more space, so it clears the least recently used, or least used, page to make room for another requested document. The caching process -- theoretically -- maintains constant updating throughout the day. Caching's automated mass copying strives primarily to speed access to popular sites on the Internet.

 

Archiving involves a more manual process where an individual affirmatively goes out and copies an entire other website or section of a website (not just separate pages requested by clients) onto her local server. Archiving stores a server's input while caching stores the server output to clients. Also, archiving updates much less frequently than caching. Archiving aims to compile a library or historical resource of sorts, it does not aim to substitute copies in order to preserve bandwidth. In fact, mass archiving can consume a great deal of resources.

 

The line between caching and archiving or replicating becomes gray when a network or service provider configures its caching proxy server to permanently hold the most frequently asked for sites and keep them current. In this scenario the proxy server just permanently caches these pages and updates them either at specified intervals or whenever the TTL (Time to Live) encoded in the page expires. Although the line between caching and archiving or replicating blurs at certain points, it is important for the reader to note that caching does refer to a distinct process.

 

II. CACHING: BENEFITS AND DRAWBACKS

 

A. THE PROBLEM

 

Each WWW address specifies or implies a reference to one particular site on the Internet. This means that without some kind of additional machinery, whenever a person requests a specific WWW address, no matter where she is from and no matter how often others in her network request the same address, she will make a network call to that specific site, leading to unnecessarily high use of network links and excessive load on the servers for popular sites.

 

B. CACHING BENEFITS

 

High use of network lines and excessive load on popular servers leads to one of the single biggest problems experienced by Internet users today: lack of adequate bandwidth. Information abounds on the Internet, but the delay involved in retrieving that information frustrates many users. Until the Internet infrastructure upgrades to bigger "pipes" which can transmit greater amounts information in the same amount of time, Web surfers must look to other means to relieve the congestion. Caching helps to relieve Internet congestion in five ways: 1) caching expedites user access time; 2) caching decreases the amount of bandwidth each user uses; 3) caching decreases bandwidth used on the Internet generally; 4) caching decreases bandwidth used on network servers; and 5) caching decreases bandwidth used on remote servers. Caching creates social benefits (by reducing traffic on the Internet generally) and it creates private benefits (by saving time and conserving bandwidth for individual users and servers). Thus, caching improves the Internet for everyone.

 

C. CACHING DRAWBACKS

 

Caching has three main drawbacks: 1) caching inhibits websites' ability to calculate hits and page impressions; 2) caching may result in promulgation of stale documents; and 3) caching may constitute copyright infringement.

 

1. Hits and Page Impressions. A "hit" is a form of measurement on the WWW. Websites measure one "hit" for each time a client (or proxy server) requests a file from their server. Note that the number of files on a web page vary greatly from page to page. The use of graphics adds many files to a single page. Thus, if a user requests a page without any graphics, the website may measure only a singe hit, but if a user requests a page with many graphics, the website may measure 30 or more hits. Instead of hits, Websites may also count "page impressions" as measurements of usage. Websites using this system measure one "page impression" for each time a user or proxy server requests a page from a website's server. Page impressions avoid the file discrepancy problem that occurs with hits.

 

The problem with caching and hits or page impressions is that when a client or proxy server caches a page, the website logs only one access. The remote content server will never know that ten, a hundred or a thousand users have accessed a particularly popular document based on a single access by a proxy cache! Hits and page impressions offer valuable information to websites -- they help websites gauge usage and understand their users' preferences, and more importantly, websites use hit and page impression statistics to sell advertising space. Websites that log more hits and/or page impressions can sell more valuable advertising space. Many websites provide information to users free of charge and rely on selling advertising space (such as banners) in order to generate revenue. Since advertising generates their revenue, websites prize their hit and/or page impression data and dislike the fact that caching deprives them of this information.

 

While websites often prize their hit and/or page impression data, it is important to note the flaws of these measurements. I have already noted one flaw: hits do not distinguish between a one file page and a 30 file page and thus graphic laden pages can greatly skew statistics if people rely on hits as a metric. Secondly, neither hits nor page impressions can tell how long a user viewed a page. Both methods register equal values whether a used viewed a page for an hour, or simply clicked through in a split second in order to reach a different destination. Thirdly, neither method can distinguish between users. A website measures the same 100 hits or page impressions whether made from one crazed fan or 100 separate fans. Advertisers obviously would prefer to reach 100 separate fans. Due to these flaws, web-savvy people know to account for the dubious figures of hit and page impression statistics. These usage statistic techniques are simply the best technology currently offers (without requiring users to log in and identify themselves or invest in additional tools to glean insights, i.e. IP address analysis). Advertisers have learned to adjust to these flaws of the hit and page impression calculations. Advertisers will likely also find ways to adjust to caching's twist on hit and page impression statistics.

 

2. Stale Documents. Caches supposedly update regularly and frequently. However, no fixed schedule exists whereby caching servers guarantee to keep their cached copies current. Vigilant cache servers may update every 30 minutes. Delinquent cache servers may not update for days or weeks.

 

When users request information from a remote website, they may in fact receive that information from a cache. If the cache information is stale (i.e. the remote website has changed its content since it was cached) the user has received, at best, outdated information and, at worst, harmful and misleading information. The degree of the threat of stale information depends on the nature of the website's content. If a user requests today's Dilbert cartoon, but receives yesterday's cartoon because the cache has not updated yet, the user suffers little harm beyond annoyance. But what happens if a user invests her money based on a cached page of the NYSE ticker page? Should she bet her money on old (even if it is only 30 minutes old) information? Additionally, what happens if a website posts information (e.g. liability inducing speech such as defamatory or obscene statements) which it pulls down a few hours later, but a server has already cached the information? Now the liability inducing information does not exist on the website, but it lives on in a cache. If a lawsuit over the information should emerge, who should a court hold liable? Caching causes websites to lose control of their content. Proliferation of outdated information on a medium known for its immediacy constitutes a large drawback for the process of caching.

 

3. Copyright Infringement. Caching web pages involves copying other peoples' intellectual property and may constitute copyright infringement. In order to determine whether caching constitutes a copyright infringement, one must closely examine whether caching meets the criteria for infringement and then whether cachers are eligible for the affirmative defense of fair use. I have dedicated section III of this paper to my copyright analysis.

 

III. IS CACHING COPYRIGHT INFRINGEMENT?

 

A. WHAT IS A COPYRIGHT?

 

A copyright is a right of intellectual property. Copyright grants authors, for a limited time, certain exclusive rights to their works. Copyright is exclusively federal law, and derives from the "copyright clause" of the Constitution which provides that, "The congress shall have the Power To promote the progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries."

 

To be eligible for copyright protection, a work must meet two conditions: 1) it must be an original work of authorship, and 2) be fixed in a tangible medium of expression.

 

The U.S. Copyright Act grants a copyright owner the exclusive right to do and to authorize any of the following: (1) to reproduce the copyrighted work in copies; (2) to prepare derivative works based upon the copyrighted work; (3) to distribute copies to the public; (4) to perform the copyrighted material publicly; (5) to display the copyrighted work publicly; and (6) to digitally perform the work.

 

Caching can encroach on most of the copyright holders' six exclusive rights. First, both proxy caching and client caching implicate the copyright holders' reproduction rights because they both "reproduce" a copy into their caches. Second, proxy caching implicates the public display, public performance, and digital performance rights. To perform or display a work "publicly" means to transmit a performance or display of the work to the public (i.e. to a substantial number of people outside of a normal circle of a family and its social acquaintances) , by means of any device or process, whether the members of the public receive the performance in the same place or in separate places and at the same time or at different times. Proxy caching makes its cached copy available to all those who use the proxy, which clearly places proxy caching within the definition of public display, public performance, and digital performance (the nature of the work, e.g. music or literature or computer program, determines which of these three rights are implicated). Third, proxy caching encroaches on copyright holders' distribution rights. The United States' Task Force on Intellectual Property states that making copies of a copyrighted work widely available online constitutes infringement of the copyright holder's distribution rights. Proxy caches regularly make copyrighted works widely available online to all their clients.

 

B. WHAT CONSTITUTES COPYRIGHT INFRINGEMENT?

 

To sustain a claim for copyright infringement, a plaintiff must show a) ownership of the copyright and b) copying of the protected material. A plaintiff may demonstrate ownership of her copyright by simply registering her work. One can register before or after the infringement has occurred, so a plaintiff can meet this requirement easily. In the case of caching, a plaintiff can also meet the second requirement, a showing of "copying," easily because caching, by definition, is copying.

 

Based on the ease of showing both 1) ownership of one's copyright and b) copying of the protected material in the caching context, caching certainly constitutes copyright infringement. However, the copyright analysis of caching does not end here. Even when a plaintiff clearly establishes copyright infringement, a defendant may assert the affirmative defense of "fair use."

 

C. FAIR USE: AN AFFIRMATIVE DEFENSE TO COPYRIGHT INFRINGEMENT

 

Recall that the U.S. Constitution states that copyright law exists in order to promote science and the useful arts. The "fair use" doctrine allows courts to avoid rigid application of the copyright statute when a finding of infringement would actually serve to inhibit the very artistic and scientific activity copyright law strives to foster. The fair use statute states: "Fair use of a copyrighted work, including such use by reproduction in copies ... or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright." The statute declares that a court should consider the following four factors when determining whether a use of a copyrighted work constitutes a "fair use": (1) the purpose and character of the use; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and (4) the effect of the use upon the potential market for or value of the copyrighted work. Fair use entails a multi-factor analysis where no factor dominates. Below I break down each of the four factors of fair use, relate each factor to the web caching context, and then specifically apply the factor to a hypothetical situation in order to supply at least one "real life" example of the caching copyright conflict. The hypothetical situation involves two characters: 1) USOL, a popular Internet service provider with millions of members, and 2) "HowTired," a hip online magazine which registers upwards of 500,000 hits per day. Because HowTired is so popular with USOL's users, USOL caches HowTired's entire site on a regular basis.

 

1. Purpose and Character of the Use. In considering the purpose and character of the use, courts have looked at whether the use is for commercial or for non-profit or educational purposes. If the defendant uses the copyrighted work commercially, the use is less likely to be fair use. If the defendant uses the same work in a non-profit school to teach, the use is more likely to be fair use. Some courts might forgive commercial use if the use is a "productive" or "transformative" use. Transformative uses are uses that add value to the material taken from the copyrighted work. The Supreme Court has noted that this distinction between transformative and nontransformative uses is not wholly determinative, but can be considered when a court is balancing interests.

 

The distinction between client caching and proxy caching commands different analyses regarding the purpose and character factor. Client caches, which cache only to one client, usually do not serve any commercial purpose. On the other hand, proxy caches may cache precisely for commercial purposes. Many networks which run proxy caches charge for their services. Quicker access to more popular pages can warrant higher service fees. Consumers value speedy access. Moreover, caching saves network resources. It is more efficient to provide multiple copies direct from a proxy server than to repeatedly traverse the Internet to provide single copies from remote servers. In fact, one may analogize proxy caching to the Texaco case where a corporation made photocopies of professional journal articles for its researchers' personal files. In Texaco, the court found the defendant guilty of copyright infringement for photocopying costly trade journals for staff members who wished to conveniently keep copies in their offices rather than trekking to the company library. The infringing photocopies benefited the company in two ways: 1) the company saved money by not paying for extra copies of costly trade journals; and 2) the company's employees got to work more efficiently because of the local copies. Analogously, caching benefits service providers by: 1) saving money because they can avoid buying more computer and communications equipment; and 2) giving the users better performance.

 

One may argue that caching has a transformative use. Caching adds value to the copyrighted material taken because faster Internet service produces a value that benefits the broader public interest. The more user-friendly the Internet, the more people will use it. Thus, caching encourages the growth of this promising young communications network. This argument may carry some weight, but it must be balanced with the commercial/non-commercial use analysis. Transformative use alone does not dictate a finding of fair use.

 

Extrapolating to the hypothetical situation where USOL caches HowTired, the purpose and character factor would indicate that the use is not fair use. USOL is a commercial operation. Its cached copy of the HowTired website enhances its commercial services. Through caching, USOL preserves its own resources (caching requires less telecommunications hardware, thereby reducing costs) and improves its Internet service by accelerating user access time. USOL's cached copies do not serve any direct educational purpose. One might argue that while USOL's caching does not serve an educational purpose, it does have a transformative use. The argument follows: easing access to content on the WWW furthers copyright law's goal of promoting science and the arts, thus, even if USOL's cache runs for profit, the immediate effect of caching serves the public benefit. While this argument has some merit, I believe it is out of place in the context of the "purpose and character" factor. Caching speeds access to the Internet, but USOL could also speed access by buying more hardware and faster modem connections. USOL chooses to cache because it better serves its commercial purposes.

 

2. Nature of Copyrighted Work. The second factor the courts use in evaluating fair use is whether the nature of the copyrighted work is factual or fictional. Courts consider copies of factual works more easily subject to fair use copying based on the principle that copyright law protects expression, not facts. Owners of fictional or artistic works, which are more expressive than factual works, have a stronger claim to their copyrights. Thus, one who copies a fictional or artistic work will have a much more difficult time claiming fair use.

 

Web page content varies tremendously on the Internet. Some pages simply catalog contact information such as phone numbers and addresses (e.g. www.Four11.com or www.switchboard.com). According to Feist case, phone numbers constitute unprotectable facts. Some web pages display original literature and/or digital works of art and would warrant higher copyright protection. Obviously, in the context of web page caching, the actual content of the web page copied will influence whether the person copying may successfully claim the fair use defense.

 

When evaluating this second factor of fair use, courts also examine whether the work has been published. Courts usually consider use of an unpublished work as more likely to infringe than an analogous use of a published work. Courts grant more protection to unpublished works because: a) they have not yet taken advantage of their valuable right to first publication; and b) unauthorized use affects the copyright holder's ability to chose not to publish the work at all.

 

Although posting one's work on the WWW differs slightly from traditional publishing, a court is likely to find that all works posted on the WWW are "published" for the purposes of copyright law. The WWW is a very public forum, it has a world-wide audience whose numbers far exceed most audiences for traditionally published works. A copyright owner who posts her work on the web has already made use of her right to first publication by making the work so widely available and has already obviously chosen to make the work public. Since all works on the WWW qualify as published and therefore less needy of protection, this element would cut in favor of a court finding caching a fair use.

 

Additionally, some have tried to argue that by posting/publishing one's work on the Internet, one has granted the public an "implied license" to cache the copyrighted work. According to the law, a copyright holder cannot transfer ownership of any of her rights on an exclusive basis without a written agreement. However, a court may imply a grant of a nonexclusive license from a copyright holder's conduct. A copyright holder must exhibit conduct which leads another to reasonably believe that the copyright holder issued an authorization to copy or a waiver of her rights. Those who argue that an "implied license" to cache exists assert that because copying and caching pervade the Internet, simply posting one's material in such an environment constitutes "conduct" indicative of an intent to license the public to cache one's material. In fact, a court is unlikely to find that simply posting one's work on the Internet is conduct enough to imply the broad license to cache by any member of the public. Posting on the Internet may constitute an implied license to copy in some situations. For instance, since it is necessary for a user to copy a web page into RAM in order to view a web page, and since a user may reasonably to assume that one who posts a web page wishes others to view it, a court may find that web page owners have granted users an implied license to copy works into RAM. However, a person may not reasonably assume that simply by posting one's work on the Internet, one wishes to subject her work to a cache which may deprive it of hit statistics and/or forward it to others in a stale and inaccurate form.

 

Although works on the WWW qualify as published and therefore less needy of copyright protection, this factor is unlikely to be dispositive in a fair use analysis of caching. I doubt a court would ever declare a copy of a copyrighted work fair use just because the copyrighted work was published. And, without conduct beyond simply posting one's material on the Internet, a court should not find an implied license to cache. At most, publishing would weigh in as one of many elements which would tip the scales in favor of finding a fair use.

 

The best argument cachers can assert under factor two in favor of fair use may be found in the Netcom case. In Netcom, the court found the nature of the works copied irrelevant because Netcom made copies merely to facilitate their posting. The content of the material copied made no difference to Netcom. Similar to Netcom, cache servers typically make copies merely to facilitate access to the information. The content of the material copied makes no difference to cachers. Thus, cachers can strongly argue that the precise nature of the works copied is not important to the fair use determination.

 

In the USOL/HowTired situation, USOL caches HowTired solely to facilitate user access to the popular WWW site. It would not matter to USOL if HowTired contained pure factual reports or creative fiction. Thus, a court should analogize the USOL caching situation to the Netcom situation, and proclaim this factor unimportant to the fair use determination.

 

3. Amount and Substantiality of the Copying. As a third factor in the fair use analysis, courts consider the amount and substantiality of the portion of the copyrighted work used in relation to the copyrighted work as a whole. In some cases, a court may consider a use which incorporates 90% of another copyrighted work less likely to be a fair use than a use which only incorporates 10% of another copyrighted work. However, this third factor entails a more complex analysis than just calculating percentages. A court may deny a finding of fair use even in the instance where one copies only 10% if that 10% is "qualitatively substantial." A use is qualitatively substantial if the portion copied goes to the heart of the work. Moreover, while a court can find use of a minor percentage "substantial," a court may also find use of a major percentage insubstantial and eligible for fair use. In the Sony case (in which the Supreme Court found that off-air non-archival videotaping of broadcast television was a fair use), the court found fair use despite an exact and total copy of the original. In the Sega case, the court also found fair use despite an exact and total copy of the original because the defendant needed to make the copy in order to study the original (the original work here was a computer program).

 

In order to evaluate whether a cached web page constitutes a substantial copy of the copyrighted work, the court must first define what constitutes the work in question. A client or proxy cache may only copy one page or one section of a website. Is caching one web page or section like taking a page or chapter out of a book (which would constitute a small portion of the work in question) or like taking an entire discrete article out of a magazine. Usually, a cache server only copies material a user requests. In most cases, I suspect that users request at least one discrete section of a website, not single pages out of their context. Thus, caching seems more analogous to copying an entire article out of a magazine. Copying an entire discrete section rather than a page obviously constitutes a more substantial copy and renders the use less likely to be fair use.

 

Some might argue that the fair use videotaping in Sony for the purpose of time shifting is analogous to caching material because caching serves time shifting purposes too (people cache material so they can view it at a later time if they access the site again). This argument may be persuasive in the client caching situation, especially in cases where an individual uses software specifically for the purpose of time shifting. However, the analogy to Sony does not work in the proxy cache situation. Proxy servers do not cache to shift times; they cache to make the material available for others to use.

 

In the USOL/HowTired hypothetical, USOL caches the entire HowTired site. So, USOL has little room to argue that its cached copies amount to an insubstantial amount of the copyrighted work. Further, because USOL's cache in this situation is a proxy cache (although USOL also offers its users a client cache system in its Internet browser), USOL cannot analogize to Sony. USOL caches HowTired to provide quicker access to the HowTired site to USOL users. USOL does not cache in order to time shift. USOL may try to analogize to Sega as the court did in Netcom. In Netcom, the court excused a 100% copy because, like in Sega, " ... Netcom had no practical alternative way to carry out its socially useful purpose; a Usenet server must copy all files ..." This analogy will be difficult for USOL to carry off because USOL does not need to cache in order to make the information available to its users. Investing in more telecommunications equipment to speed user access time would be a practical alternative to caching for USOL.

 

4. Effect of the Copying on the Market. The U.S. Supreme Court has stated that the fourth factor is the "single most important" in evaluating fair use. The fourth factor dictates that if the use affects the market for the copyrighted work, the court should be less likely to hold the use as a fair use.

 

It is difficult to identify the market in the case of caching because, in most cases, one does not pay to access a person's website. What market exists for material the owner distributes for free? Some argue that if a website makes its material available for free, there is no market for the material which copying can encroach upon. According to this argument, caching, by making websites more accessible, enhances the copyright holder's market. This argument rings false. While the direct consumers receive the material for free, website material can create an undeniable market for advertising dollars. The more desirable a website's material, the more dollars it can command for advertising space on its site. This advertising market model should sound familiar to any person who listens to the radio or watches television. Like many websites, many radio and television stations provide content to their audience for free and earn revenue by selling advertisers time to market their product(s) to the audience.

 

Caching hinders the market for advertising space on web pages in two ways. First, caching obscures hit and page impression statistics. Many websites form advertising contracts by guaranteeing or selling a certain number of hits or page impressions (i.e. the website promises to post an advertisement until the website logs the contracted number of hits or page impressions). Caching invalidates the statistics and thereby greatly affects the advertising market created by the copyright owner's work. Second, caching interferes with the copyright owner's control over her product. The copyright owner sells advertising space generated by her work. She may sell advertising space on a very tightly run schedule (i.e. one hour on the top page, 3 hours on the sports page, etc.). If a server caches a copy at 12:00, all the users of that cache (one for a client cache or up to thousands for a proxy cache) would see the advertisement posted at 12:00 even if they accessed the page at 12:30 or 1:00. Copyright owners could no longer effectively sell advertisers a certain space at a certain time.

 

In the USOL/HowTired hypothetical, USOL's proxy cache masked many thousands of hits for HowTired. HowTired provides its content for free; advertising is its sole source of revenue. HowTired guarantees its advertisers both a certain amount of display time online and a certain amount of hits. USOL's caching impedes HowTired's ability to extract value when it makes such contracts with advertisers. As a result, HowTired may sell less advertising and lose revenue. By impeding the advertising market, caching discourages HowTired and other sites like them from operating. Caching damages the market for their method of business. In this way, caching runs counter to the Copyright Act's goal of encouraging creativity.

 

D. FAIR USE CONCLUSION

 

The multi-factor fair use analysis allows courts to reach very subjective conclusions. In the context of caching, courts can interpret the "purpose and character" factor as commercial or transformative. Analysis of the "nature of the copyrighted work" factor varies greatly depending on the content of the underlying web page and the cacher's purpose. The "amount and substantiality" factor requires complex quantitative as well as qualitative analysis. The final "effect upon the market" factor depends on how one chooses to define the relevant market. Therefore, no sweeping generalizations can be meaningfully made about whether caching will constitute fair use.

 

I have carried the USOL/HowTired hypothetical through the four major factors of fair use. I conclude that a court would likely hold that USOL's cache does not qualify as a fair use because: a) USOL is a profit seeking company and caching serves its commercial ends, b) USOL caches all of HowTired's entire site, and c) USOL's caching impairs HowTired's advertising market, its sole source of revenue. In many other situations, courts may find caching to qualify as a fair use, but in at least some situations (such as in the HowTired/USOL situation) a court may very well find that caching does not qualify as a fair use. The very possibility that one's caching will subject one to liability for "unfair" copyright infringement may be enough to chill caching on the Internet.

 

IV. WILL AND/OR SHOULD PROXY CACHING SURVIVE?

 

Various factors exist which suggest that caching will or should survive on the Internet despite the chilling reality that a basic application of the fair use doctrine threatens to find some instances caching ineligible for the affirmative defense. One factor which may save caching is a pre-infringement technological solution to caching's problems. The other two factors which may save caching involve a more in depth analysis and balancing of the goals, history and policy behind copyright law and the economics of caching within copyright law.

 

A. PRE-INFRINGEMENT TECHNOLOGICAL SOLUTIONS

 

The general sentiment on the WWW seems to favor caching. Internet surfers prize expedited access and the free flow of information. The thought of copyright law putting a damper on caching upsets these users. Technological solutions may help these users avoid clashes with copyright law by resolving the problems caching creates pre-infringement.

 

Several technological solutions exist for web content providers to completely block caching of their site. Methods such as instituting password protection or executable language protect websites from caching. Password protection requires each individual user to type in her identification information. A proxy cache server cannot perform this for each user and therefore cannot gain access into and cache a password protected site. A proxy cache similarly cannot cache executable script. While both these methods circumvent caching, they may not be ideal options for the web content provider. Web surfers generally do not like password protected sites; they like to jump freely from one site to another. Password protection -- simply pausing to remember and type in one's identification information -- deters a great deal of potential viewers from ever entering one's site. Executable script does not deter web surfers from entering one's site, but converting one's documents into executable script instead of the traditional HTML requires a great deal more expertise and time. Most significantly, by blocking all caching all the time, methods such as password protection and executable script defeat all the benefits of caching instead of just addressing caching's problems. It is not in most sites best interests to defeat caching altogether. Caching, by increasing bandwidth and access time, has a beneficial effect on the Internet and in turn each individual website benefits.

 

As an alternative to blocking caching proxies altogether, technology has begun to develop alternative mechanisms which incorporate the benefits of caching while addressing one of the caching's most important flaws. These alternative mechanisms strive to ensure that proxy caches do not issue stale documents by communicating how long content servers wish their data to be distributed through proxy caches. Examples of these mechanisms are:

 

1) Content server sends expiry date/time, proxy cache does not retrieve a new copy until document expires. Documents can be sent pre-expired if they ought not be cached;

 

2) When client requests copy of document that proxy has in cache, proxy asks content server for headers including last change date/time, and only fetches fresh copy of document contents if necessary;

 

3) Proxy sends a "conditional request" to content server, "send me this document if it changed since date/time," server sends standard response if document has not changed.

 

Document expiry mechanisms such as these allow proxy caches and content servers to work together to reduce network overhead. Content providers can prevent the stale document problem by specifying early expiration dates for entries that they expect to change frequently, but encourage efficiency by specifying late expiration dates for entries that they expect to remain unchanged.

 

Document expiry headers mechanisms probably represent the practical solution for caching for the future. But, these mechanisms cannot resolve caching problems today. In order for expiration dates and headers to work, 1) the information must be encoded in a standard, and 2) the proxy server must honor that standard. Today the Internet has no established standard, just some proposed standards. Since caching is relatively new and there are no standards, the problem lies in that many cache servers do not respect the expiration encoded, the administrator did not put on a realistic expire, or the document does not have an expire in it. Additionally, many of the documents in the Web are "living" documents and specifying an expiry date for them is generally a difficult task. A document may remain unchanged for a long time and then suddenly change. The author of the document may not have foreseen this change and adjusted her expiry information accurately. It will take some time and a significant amount of work to establish an agreed upon document expiry system and then to work the kinks out of that system.

 

B. POLICY CONSIDERATIONS FOR COPYRIGHT LAW AND CACHING POST-INFRINGEMENT: A HISTORICAL PERSPECTIVE

 

Before the days of photocopy machines and printing presses, authors had little need for copyright law protection because no one had easy mechanical means to copy another's work. As technology developed, copying became easier and authors began to need protection in order to glean the fruits from their labor. Copyright law emerged to protect authors' interests so that they would not cease to create. Copyright law is founded on the assumption that authors will decrease their output when copying prevents them from capturing revenues for their work.

 

Electronic reproducing dramatically reduces the costs associated with the production and dissemination of copies. Caching reproduces original documents with virtually no information loss, and it actually costs less (from a social perspective and perhaps from a private perspective) to cache a document than to fetch the original. These facts have led many to the apparently logical conclusion that caching must be stopped if authors are to continue to produce their works in cyberspace.

 

The apparently logical conclusion (that caching equals easy copies and must be stopped before it deters authors from producing) fails to understand the technology and realities of cyberspace. Prior to the computer age, copies made by photocopy machines and printing presses were possible, but difficult, expensive, time consuming, imperfect, and physically tangible. Copyright law predicated itself on the existing labored system of metering out copies tangibly. Today, computers and the Internet make copying not only effortless, cheap, quick, and perfect, they also make copying a fundamental necessity. Caching, along with all basic computer functions, relies on reproducing information. Computers are devices designed for rapid copying of information. In order to load its "start-up" program, a computer must "copy" the program from wherever it was previously stored into RAM for execution. To view previously stored documents in one's wordprocessor, the computer must copy them into RAM so that they can appear on the screen. Reading one's electronic mail involves many acts of copying both the program and the letters' content. Browsing websites and viewing them on one's screen requires a computer to copy data into RAM. The Internet simply could not exist without computers engaging in constant copying.

 

In light of digital era's reliance on copying, some view cyberspace as "a kind of 'boundary' condition where certain fundamental assumptions of the model break down." These people feel that the historical assumption -- that lower copy costs equals lower incentive for authors -- no longer holds completely true. While caching may result in less production by some authors, the new age of the information highway gives birth to a whole new breed of authors and an entirely different system of reaping profits from one's intellectual property. To caching proponents, caching is a powerful new tool for the production and distribution of creative works. They suggest that disallowing caching will hinder the Internet and in turn stifle the potential of the new breed of authors on the Internet.

 

C. POLICY CONSIDERATIONS FOR CACHING AND COPYRIGHT LAW POST-INFRINGEMENT: AN ECONOMIC ANALYSIS

 

The Coase Theorem suggests that the choice between a copyright regime which permits caching and a regime which prohibits caching is allocatively neutral (i.e. it will produce an identical allocation of caching "rights" and an identical incidence of caching behavior) as long as transactions costs are absent. However, transactions costs are never absent. In the caching scenario, the transaction costs are the costs of implementing a smooth system for those parties who wish to participate in caching. The question is who will bear the transaction costs of defining the parties' respective obligations and negotiating between willing parties.

 

Economic efficiency requires placing transaction costs on the lowest cost avoider. Under the old copyright regime where copies were difficult and more rare, copyright law declared most copying illegal, thereby placing the transaction cost (of negotiating a contract to copy) burden on the person who wished to make copies. But the ubiquity of copying on computers and the Internet may fundamentally change this analysis. Online technology demands constant copying and most parties participate in and accept constant copying. To the extent that many websites appreciate caching (especially when they can control stale information with expiration headers), it is reasonable to suggest that the likelihood of acceptance of the "I propose to copy this file -- may I do so?" bargain will be high. In such an environment, a copyright law which declares most copying illegal and forces parties who wish to copy to bear the transaction costs no longer remains the most efficient set up. Thus, the transaction cost balance should shift.

 

A comparison of the efficiency of these alternatives depends on the distribution of contract acceptances and rejections. In a context where most copyright holders reject offers, (i.e. "May I copy?" answer, "No."), the common law rule is sensible and efficient. In such a context the majority of transactions will consist of a single message: the copyright holder can simply ignore any offer. However, in a context like the Internet, where a copyright holder is likely to accept offers, the balance reverses. On the Internet, a silence-acceptance rule minimizes transaction costs.

 

CONCLUSION

 

Caching, by definition, is a form of copying. According to U.S. copyright law, caching servers, when they cache copyrighted material, commit copyright infringement. In some situations, the cacher guilty of copyright infringement may successfully assert the affirmative defense of fair use. In other situations, like the USOL/HowTired situation, a court may find that caching does not constitute a fair use. The doctrine of fair use is highly fact driven. It allows courts a great deal of room for subjectivity and engenders unpredictability in the law. Unpredictable laws, laws which do not clearly distinguish legal behavior from illegal behavior, tend to have an overbroad chilling effect. Many cachers, upon hearing of their uncertain fate, will likely chose to cease caching rather than risk facing a lawsuit and potential liability.

 

Millions of bits of data traverse the Internet every day. Much of this data is cached. The ultimate aim of the Copyright Act is not to reward the labor of authors, but to promote the progress of science and the useful arts. Copyright law needs to determine whether caching impedes or enhances this goal. Three factors cut in favor of copyright law determining that caching, at least to some extent, serves the goal of promoting art and science. First, the fact that technology can tame the drawbacks of caching while retaining most of the benefits tips the scales in favor of allowing caching. Second, history has changed basic assumptions of copyright. Before, easy copies threatened authors' ability to capture revenue. Today, authors on the Internet are devising entirely different methods of capturing revenue from their works. The supply of creative works on the Internet does not positively correlate with strong protection against caching. Third, economics indicates that a free caching default rule would efficiently lower the transactions costs of negotiating for copies in a medium dominated by copy-making.

 

While I suggest that copyright law strive to update itself and accept caching, I suspect that in reality, the law will not matter on this issue. Change occurs far faster on the Internet than in our legislative meetings or courtrooms. The parties most affected by caching, the website content providers and the owners of proxy cache servers have ample incentives to sort this issue out amongst themselves. The threat of a copyright infringement suit will bring cachers to the bargaining table. The reality that caching benefits the Internet and should continue will bring content websites to the bargaining table. The two sides will work out a technological solution (along the lines of the expiration headers) long before copyright law catches up.